Simplifying metaphorical language for young readers: A corpus study on news text
نویسندگان
چکیده
The paper presents first results of an ongoing project on text simplification focusing on linguistic metaphors. Based on an analysis of a parallel corpus of news text professionally simplified for different grade levels, we identify six types of simplification choices falling into two broad categories: preserving metaphors or dropping them. An annotation study on almost 300 source sentences with metaphors (grade level 12) and their simplified counterparts (grade 4) is conducted. The results show that most metaphors are preserved and when they are dropped, the semantic content tends to be preserved rather than dropped, however, it is reworded without metaphorical language. In general, some of the expected tendencies in complexity reduction, measured with psycholinguistic variables linked to metaphor comprehension, are observed, suggesting good prospect for machine learning-based metaphor simplification. 1 Motivation and problem statement Text simplification is the process of meaning preserving reduction of discourse complexity whose purpose is to adapt text for specific populations of readers, for instance, children or language learners. The idea has been around since “My Weekly Reader” in the 1920s and Palmer’s work (1932) and over the past 20 years has attracted attention of the computational linguistics community. While broadly interpreted “lexical simplification” – in general understood as substitution of “difficult” words with “simpler” ones – is a common component of automated simplification systems (see, for instance, (Siddharthan, 2014)), studies of text simplification dedicated to specific lexis-related semantic phenomena are lacking. One class of such understudied phenomena are those related to figurative language; a surprising gap in the simplification research considering that metaphors have been shown to cause difficulties in text comprehension and that developing metaphor interpretation competence is a complex developmental process (for an overview, see, for instance, (Winner, 1997)). Since automated systems are trained on corpora of simplified text, understanding patterns of metaphor simplification based on corpus data could help improve simplification models. In this paper we present a study that is our first step in this direction. We analyze linguistic metaphors in a corpus of news texts professionally simplified for different grade levels. While editors’ guidelines instructed to avoid vivid metaphors, such as “paint into a corner”, our goal was to find out whether, and if so, how, linguistic metaphors in general are simplified by professional editors. Since ultimately we want to build automated metaphor simplification models, the purpose of this study is to investigate whether metaphors in a corpus of professionally simplified text, that is, potential training data, are simplified in systematic ways. Specifically, we were interested in two questions: 1) What types of discourse modifications do editors perform when simplifying metaphorical language? (in other words, whether a well-defined set of classes for the metaphor simplification task can be specified). 2) Do professional editors simplify metaphor phenomena in systematic ways? (if not, training simplification models using machine learning based on corpus data may not be promising). The paper’s structure follows the data-driven methodology adopted for this study: We first define the criteria used to identify the phenomenon in question: linguistic metaphor. Next, we present
منابع مشابه
Metaphorical Conceptualization of SPORT Through TERRITORY as a Vehicle
WAR as a vehicle and Sport Is War as a conceptual metaphor (CM) seem inadequate to account metaphorically for SPORT. To cater for an inclusive vehicle/CM, we selected WIN and LOSS lexicon from the news coverage of Brazil’s football team loss to Germany and tested them through the Corpus of Contemporary American English. Then, the data were studied through the 3 stages of metaphor research. In t...
متن کاملArabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملText simplification for language learners: a corpus analysis
Simplified texts are commonly used by teachers and students in bilingual education and other language-learning contexts. These texts are usually manually adapted, and teachers say this is a timeconsuming and sometimes challenging task. Our goal is the development of tools to aid teachers by automatically proposing ways to simplify texts. As a first step, this paper presents a detailed analysis ...
متن کاملAssessing Conformance of Manually Simplified Corpora with User Requirements: the Case of Autistic Readers
In the state of the art, there are scarce resources available to support development and evaluation of automatic text simplification (TS) systems for specific target populations. These comprise parallel corpora consisting of texts in their original form and in a form that is more accessible for different categories of target reader, including neurotypical second language learners and young read...
متن کامل